Abstract:Multimodal recommender systems aim to improve recommendation performance via multimodal information such as text and visual information. However, existing systems usually integrate multimodal semantic information into item representations or utilize multimodal features to search the latent structure without fully exploiting the correlation between them. Therefore, a multimodal recommendation method integrating latent structures and semantic information is proposed. Based on user's historical behavior and multimodal features, user-user and item-item graphs are constructed to search the latent structure, and user-item bipartite graphs are built to learn the user's historical behavior. The graph convolutional neural network is utilized to learn the topological structure of different graphs. To better integrate latent structures and semantic information, contrastive learning is employed to align the learned latent structure representations of item with their multimodal original features. Finally, evaluation experiments on three datasets demonstrate the effectiveness of the proposed method.
[1] WU S W, SUN F, ZHANG W T, et al. Graph Neural Networks in Recommender Systems: A Survey. ACM Computing Surveys, 2022, 55(5). DOI: 10.1145/3535101. [2] HE R N, MCAULEY J. VBPR: Visual Bayesian Personalized Ran-king from Implicit Feedback // Proc of the 30th AAAI Conference on Artificial Intelligence. Palo Alto, USA: AAAI Press, 2016: 144-150. [3] LIU Q, WU S, WANG L.DeepStyle: Learning User Preferences for Visual Recommendation // Proc of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York,USA: ACM, 2017: 841-844. [4] WANG X, HE X N, WANG M, et al. Neural Graph Collaborative Filtering // Proc of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval. New York,USA: ACM, 2019: 165-174. [5] HE X N, DENG K, WANG X, et al. LightGCN: Simplifying and Powering Graph Convolution Network for Recommendation // Proc of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. New York,USA: ACM, 2020: 639-648. [6] WEI Y W, WANG X, NIE L Q, et al. MMGCN: Multi-modal Graph Convolution Network for Personalized Recommendation of Micro-Video // Proc of the 27th ACM International Conference on Multimedia. New York,USA: ACM, 2019: 1437-1445. [7] WEI Y W, WANG X, NIE L Q, et al. Graph-Refined Convolutio-nal Network for Multimedia Recommendation with Implicit Feedback // Proc of the 28th ACM International Conference on Multimedia. New York,USA: ACM, 2020: 3541-3549. [8] WANG Q F, WEI Y W, YIN J H, et al. DualGNN: Dual Graph Neural Network for Multimedia Recommendation. IEEE Transactions on Multimedia, 2023, 25: 1074-1084. [9] ZHANG J H, ZHU Y Q, LIU Q, et al. Mining Latent Structures for Multimedia Recommendation // Proc of the 29th ACM International Conference on Multimedia. New York,USA: ACM, 2021: 3872-3880. [10] ZHOU X, SHEN Z Q.A Tale of Two Graphs: Freezing and Denoising Graph Structures for Multimodal Recommendation // Proc of the 31st ACM International Conference on Multimedia. New York,USA: ACM, 2023: 935-943. [11] WU J C, WANG X, FENG F L, et al. Self-Supervised Graph Learning for Recommendation // Proc of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York,USA: ACM, 2021: 726-735. [12] YU J L, YIN H Z, XIA X, et al. Are Graph Augmentations Necessary? Simple Graph Contrastive Learning for Recommendation // Proc of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York,USA: ACM, 2022: 1294-1303. [13] LIN Z H, TIAN C X, HOU Y P, et al. Improving Graph Collaborative Filtering with Neighborhood-Enriched Contrastive Learning // Proc of the ACM Web Conference. New York,USA: ACM, 2022: 2320-2329. [14] YANG Y H, HUANG C, XIA L H, et al. Knowledge Graph Con-trastive Learning for Recommendation // Proc of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York,USA: ACM, 2022: 1434-1443. [15] XIA L H, HUANG C, XU Y, et al. Hypergraph Contrastive Co-llaborative Filtering // Proc of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York,USA: ACM, 2022: 70-79. [16] TAO Z L, LIU X H, XIA Y W, et al. Self-Supervised Learning for Multimedia Recommendation. IEEE Transactions on Multimedia, 2022, 25: 5107-5116. [17] YU P H, TAN Z Y, LU G M, et al. Multi-view Graph Convolutional Network for Multimedia Recommendation // Proc of the 31st ACM International Conference on Multimedia. New York,USA: ACM, 2023: 6576-6585. [18] CHEN J, FANG H R, SAAD Y.Fast Approximate kNN Graph Construction for High Dimensional Data via Recursive Lanczos Bisection. Journal of Machine Learning Research, 2009, 10: 1989-2012. [19] RENDLE S, FREUDENTHALER C, GANTNER Z, et al. BPR: Bayesian Personalized Ranking from Implicit Feedback // Proc of the 25th Conference on Uncertainty in Artificial Intelligence. Arlington, USA: AUAI Press, 2009: 452-461. [20] HE R N, MCAULEY J.Ups and Downs: Modeling the Visual Evolution of Fashion Trends with One-Class Collaborative Filtering // Proc of the 25th International Conference on World Wide Web. New York,USA: ACM, 2016: 507-517. [21] GLOROT X, BENGIO Y.Understanding the Difficulty of Training Deep Feedforward Neural Networks // Proc of the 30th Internatio-nal Conference on Artificial Intelligence and Statistics. San Diego, USA: JMLR, 2010: 249-256. [22] KINGMA D P, BA J L. Adam: A Method for Stochastic Optimization[C/OL]. [2022-12-03]. https://arxiv.org/pdf/1412.6980.pdf. [23] ZHOU H Y, ZHOU X, ZENG Z W, et al. A Comprehensive Survey on Multimodal Recommender Systems: Taxonomy, Evaluation, and Future Directions[C/OL].[2022-12-03]. https://arxiv.org/pdf/2302.04473.pdf. [24] WANG T Z, ISOLA P.Understanding Contrastive Representation Learning through Alignment and Uniformity on the Hypersphere // Proc of the 37th International Conference on Machine Learning. San Diego, USA: JMLR, 2020: 9929-9939. [25] VAN DER MAATEN L, HINTON G. Visualizing Data Using t-SNE. Journal of Machine Learning Research, 2008, 9: 2579-2605.